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The potential importance of DNA methylation in the etiology of complex diseases has led to interest in the development 
of methylome-wide association studies (MWAS) aimed at interrogating all methylation sites in the human genome. When 
using blood as biomaterial for a MWAS the DNA is typically extracted directly from fresh or frozen whole blood that 
was collected via venous puncture. However, DNA extracted from dry blood spots may also be an alternative starting 
material. In the present study, we apply a methyl-CpG binding domain (MBD) protein enrichment-based technique in 
combination with next generation sequencing (MBD-seq) to assess the methylation status of the ~27 million CpGs in 
the human autosomal reference genome. We investigate eight methylomes using DNA from blood spots. This data 
are compared with 1,500 methylomes previously assayed with the same MBD-seq approach using DNA from whole 
blood. When investigating the sequence quality and the enrichment profile across biological features, we find that 
DNA extracted from blood spots gives comparable results with DNA extracted from whole blood. Only if the amount 
of starting material is < 0.5 u,g DNA we observe a slight decrease in the assay performance. In conclusion, we show 
that high quality methylome-wide investigations using MBD-seq can be conducted in DNA extracted from archived 
dry blood spots without sacrificing quality and without bias in enrichment profile as long as the amount of starting 
material is sufficient. In general, the amount of DNA extracted from a single blood spot is sufficient for methylome-wide 
investigations with the MBD-seq approach. 



Introduction 

Epigenetic modifications to chromatin provide stability and 
diversity to the cellular phenotype. One of the most intensively 
studied modifications is the methylation of DNA cytosine resi- 
dues at the carbon 5 position. DNA methylation studies are a 
promising complement to genetic studies of variation in DNA 
sequence. First, because methylation can directly affect gene 
expression, it may capture additional individual variation in dis- 
ease susceptibility. 1 Indeed, dysregulation of DNA methylation 
has been associated with a wide variety of human diseases. 2 " 7 
Second, methylation can account for a wide variety of phenom- 
ena that characterize complex diseases 7,8 such as sex differences, 9,10 
genotype-environment interactions, 11,12 and age-related patterns 
associated with the disease course. 13 Third, as methylation is 
modifiable by environmental factors, including pharmaceutical 
interventions, 14,15 methylation sites are potentially important new 
drug targets. 16 
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The potential importance of DNA methylation in the etiology 
of complex diseases has led to interest in methylome-wide asso- 
ciation studies (MWAS) 17,18 aimed at interrogating all methyla- 
tion sites in the human genome. 1 '' Such a genome-wide approach 
proved fruitful in the context of genome-wide association stud- 
ies (GWAS) with SNPs where even the first generation of GWAS 
identified susceptible variants for common diseases. 20 " 23 

When using blood as biomaterial for a MWAS, the DNA is 
typically extracted directly from fresh or frozen whole blood that 
was collected via venous puncture. However, dry blood spots 
may also be an alternative starting material. 18 Indeed, studies 
show that in the context of targeted methylation studies, DNA 
extracted from dry blood spots gives highly reliable results. 24,25 
Two recent studies successfully used DNA from blood spots to 
investigate the methylation level of 27K and 450K CpGs, respec- 
tively, using arrays. 26,27 One of these studies 26 also used the enrich- 
ment based sequencing approach, MeDIP-seq, to investigate 
200 ng DNA from blood spots for two individuals. According 
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to the authors the MeDIP-seq approach 
was successful. 26 However, there was no 
overlap between the samples used on 
the array and for MeDIP-seq. This lack 
of overlap in combination with that 
only two assays were conducted, pre- 
vented rigorous quality evaluation from 
being performed. Taken together, these 
studies indicate that DNA from blood 
spots may be useful for MWAS. 

There are a number of advantages 
with blood spots. Blood spots are col- 
lected from finger-pricks or, for infants 
and young children, from heel-pricks. 28 
This fairly non-invasive blood collec- 
tion procedure causes minimal risks to 
the participant and may therefore be of 
particular interest for studies involving 
young children and infants, where reg- 
ular venous punctures may be challeng- 
ing. Other benefits with blood spots 
are that they are easy to store, ship and 
handle. For example, blood collected 
in tubes should ideally be stored in 
+4°C and processed as soon as possible 
to obtain maximum DNA yield and 
quality. In contrast, the collected blood 
spots can be dried in room tempera- 
ture and shipped without refrigeration. 
Furthermore, blood spots can be kept 
in a long-term storage facility (< -20°C) 
for years prior to DNA extraction with- 
out significant loss in quality. 29 ' 30 For 

example, DNA extracted from blood spots, stored for up to 
25 y, has successfully been used in whole genome amplification 
for various genetic investigations including direct sequencing 
and GWAS. 29-31 Finally, the simple collection procedure and the 
stability of the DNA in blood spots would, for example, allow 
the collection to be self-administrated by adults without requir- 
ing the participants to visit a medical facility. This may be of 
particular use in large epidemiological studies where partici- 
pants are asked to give biosamples at multiple time points and/ 
or when study participants are geographically widespread. 

A challenge when using blood spots for MWAS is the low 
yield of DNA that typically can be extracted. Whole genome 
amplification, which is applied for GWAS to generate sufficient 
DNA from blood spots, cannot be applied for MWAS because 
the methylation signal is lost if DNA is directly amplified. In this 
study we first modified the DNA extraction protocol to allow for 
maximum yield of high quality DNA from a complete blood 
spot in one single reaction. Next, we conducted methylome-wide 
profiling using next-generation sequencing to evaluate the use of 
DNA from archived dry blood spots for MWAS. In this evalu- 
ation we compared methylome-wide data from DNA extracted 
from blood spots with DNA extracted from fresh whole blood. 32 
We also compared the effect of different amounts of starting 
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Figure 1. Sequence quality of methylome-wide data. Data quality outcome variables (y-axis) are giv- 
en for the samples from whole blood (averages from all samples are shown) and from all blood spots 
with 2 fig starting material (averages from all samples are shown) as well as for all conditions for the 
two samples (sample A and B) where different amounts of starting material were used (x-axis). 



material by performing multiple assays from the same individual 
starting with 0.25-2 (Jtg DNA. 

Results 

We have used a modified protocol to extract DNA from a com- 
plete blood spot in a single reaction (see Supplemental Materials 
for details). Largely dependent on the diameter of the blood spots, 
which ranged from approximately 11—15 mm, we extracted 0.8— 
1.6 (JLg genomic DNA per blood spot. The 260/280 ratio ranged 
from 1.6-2.3 with a mean ratio of 1.9 and a standard deviation 
of 0.26, suggesting a high degree of purity of the extracted DNA. 
Furthermore, the Bioanalyzer DNA 1000 kit (Agilent) showed 
that RNA contamination was not present in our DNA samples. 
Finally, the agarose gel electrophoresis showed that the DNA 
degradation was low with the majority of DNA from each sample 
being greater than 10 kb in size. 

To investigate the quality of the methylome-wide data we 
first studied possible differences in sequence reads between DNA 
extracted from blood spots vs. whole blood. Results are shown 
in Figure 1. The measures for the percentages of non-duplicate 
reads, correctly matched reads, and uniquely aligned reads vs. 
multireads were similar for DNA extracted from whole blood 



www.landesbioscience.com 



Epigenetics 



543 



a 



CM _ 



£ 
o 



c 

Q. 

Q- oo 

E 

a 
> 
o 

10 

o 

1 

Ct> -a- 

> 

8 

& 

iS CM - 

c 

■ 
o 
c 
0) 

* O H 




v RepeatMasker 

A Intron 

+ Exon 

o 3*UTR 

O 5'UTR 

v CpG.island 

■ Isolated. CpG 




"1 — 

— CD 

18 

S- CNI 



03 



CD 


< CD 


< CD 


< CD 


< CD 


m CD 




3 


0) 3 


o 3 


o 3 


CO 3 


CO 3 


o 3 


8 






o.g 




t 8 - 




csi 


s ~ 


ta 

TO 


1 ° 


I ° 




I 6 




CO 


CO 


CO 


CO 


CO 


CO 



Figure 2. Overlap with biological features. The percentage of covered CpG sites overlapping with 
biological features are given for the whole blood (averages from all samples are shown) and from 
all blood spots with 2 /xg starting material (averages from all samples are shown) as well as for all 
conditions for the two samples (sample A and B) where different amounts of starting material were 
used (x-axis). 



and from DNA extracted from blood spots when using 2 jxg of 
starting material. For the percentage aligned reads blood spots 
performed slightly better (i.e., more aligned reads) than whole 
blood. 

Next, we investigated whether the amount of starting mate- 
rial affected sequence data quality. In Figure 1, the fairly straight 
lines for percentage aligned reads, uniquely mapped reads, and 
correct matches suggest that the amount of starting material does 
not affect the quality of the reads themselves. However, when 
using low amounts of starting material we observe more duplicate 
reads. 

We also studied whether there were systematic differences in 
terms of enrichment profiles. Figure 2 shows that when com- 
paring the percentage of covered CpGs overlapping with a num- 
ber of biological features, including, regions masked by repeat 
masker, introns, exons, 5' and 3' untranslated regions and CpG 
dense CpG islands as well as isolated CpG sites we noticed a very 
similar distribution of overlap between the samples that used 
DNA extracted directly from whole blood and the blood spot 
samples with 2 ixg DNA starting material. This suggests that the 
type of starting material (blood spots vs. whole blood) does not 



dramatically affect the enrichment pro- 
cedure of methylated fragments (Fig. 2). 

Neither did we observe any major 
differences in the distribution of enrich- 
ment profiles across biological features 
when altering the amount of start- 
ing material (Fig. 2, sample A and 
B). However, we noticed a lower ratio 
of percentage overlap (0.29 vs. 0.47, 
respectively) between CpG-rich regions 
(CpG islands) and CpG-poor regions 
(isolated CpGs) with lower amounts of 
starting material (0.5 |xg) as opposed to 
higher amounts (e.g., 2.0 u,g). 

Discussion 



In this study we found that the over- 
all quality of methylome-wide inves- 
tigations and the enrichment profile 
of the methylome using the MBD-seq 
approach is highly comparable for 
DNA extracted from blood spots and 
whole blood. Our data shows that as 
little as -1 u,g of starting material can 
be used without sacrificing the quality 
of the sequencing data or without bias- 
ing the methylation profile. However, 
if the starting material was < 0.5 (JLg 
we notice a decrease in quality, as more 
duplicate reads are then present in the 
NGS data. The explanation may be that 
when fewer fragments (lower amount of 
starting material) are competing for the 
reagents the likelihood that a single frag- 
ment generates multiple amplicons may increase. When the same 
amplicons are sequenced they would yield reads starting at exactly 
the same location (duplicate reads). As these duplicate reads are 
excluded in the quality control they will not affect the final sta- 
tistical results. However, avoiding sequencing duplicate reads to 
begin with saves resources and makes the sequencing procedure 
more cost effective. 

We also noticed that the ratio of methylated fragments over- 
lapping GC-rich and GC-poor regions shifted, suggesting that 
the enrichment profile was altered if the amount of starting mate- 
rial was < 0.5 (Jtg. This difference potentially indicates that low 
amounts of starting material slightly bias the enrichment proce- 
dure as it may favor extraction of fragment from densely methyl- 
ated regions. This suggests that the amount of starting material 
is important as less starting material might increase the variation 
between samples due to less complete enrichment for methylated 
fragments. Therefore too low amounts of starting material may 
cause false positive findings in MWAS. 

Enrichment based methylome-wide profiling using an approach 
similar to MBD-seq called MeDIP-seq, where an antibody is used 
instead of the MBD protein to capture the methylated fraction of 
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Table 1. Study samples and amount of starting material used for each methylome-wide profiling 



Individual 


No. spots 


Years stored 


DNA extracted (|xg) 


2.0 ji.g 


1.0 jj.g 


0.5 u.g 


0.25 n-g 


1 


2 


18 


2.0 


X 








2 


3 


18 


3.9 


X 


X 


X 


X 


3 


2 


19 


2.6 


X 




X 




4 


3 


16 


2.7 


X 









"No. spots" is the total number of spots from which DNA was extracted. Years stored indicates for how long the blood spots have been stored prior 
to DNA extraction. DNA extracted indicates the total amount of DNA extracted. The amount of starting material for each methylome investigation is 
indicated with an x. 



the genome, have been reported using as little as 200 ng of DNA 
from blood spots. 26 The authors performed the MeDIP assay in 
two different individuals without any duplicates and, thus, the 
quality of their data could not be properly investigated. Therefore, 
before performing large-scale investigations with MeDIP-seq the 
quality of the data when using blood spots and the boundaries of 
the amount of starting material from blood spots should be evalu- 
ated in a similar way as it has been for MBD-seq. 

It is important to note that while 1 |JLg can be considered as a 
fairly small amount of DNA it is typically the entire DNA yield 
extracted from a complete blood spot. Therefore, if the amount 
of biomaterial is very limited a complete blood spot may still be 
considered as too valuable for a single investigation. On the other 
hand, the knowledge that high quality methylome-wide assays 
can be conducted with DNA from blood spots opens for the 
opportunity to collect blood spots instead of whole blood in new 
sample collections. 

In conclusion, our study suggests that for the vast majority of 
standard blood spots, one blood spot will suffice for methylome- 
wide investigation using MBD-seq. Furthermore, we show that 
high quality methylome-wide investigations using MBD-seq can 
be conducted in DNA from archived blood spots without sacrifice 
in sequence quality or methylome-wide enrichment profile. These 
results in combination with the simple collection procedure, the 
stability of the DNA and the straightforward handling and stor- 
age requirements for blood spots enable large-scale methylome- 
wide investigations in new and existing sample collections for a 
large set of phenotypes. 

Materials and Methods 

Study design. In this study we are investigating 2-3 blood spots/ 
individual ascertained at a single time point from four individ- 
uals. The oldest blood spots were collected 19 y prior to DNA 
extraction. With this DNA we performed methylome-wide pro- 
filing using a next generation sequencing based approach. 32 To 
evaluate if DNA extracted from blood spots is equivalent with 
DNA extracted directly from fresh whole blood, we investigated 
measures of data quality and deviations in the enrichment profile 
across biological features. As starting material for the blood spots 
we used 2 (jug of DNA/individual and compared this with methy- 
lome-wide data from whole blood where 2-5 (JLg DNA/individual 
was used. 

To study the lower bound for the amount of required starting 
material, methylation assays were performed multiple times with 



different amounts of DNA from the same individuals. Using 
DNA from the same individual ensures that any detected differ- 
ences are caused by technical differences (i.e., amount of start- 
ing material) and not by biological variation. A summary of the 
study design with the amount of starting material used for each 
methylome-wide profiling from blood spots is shown in Table 1. 

Blood spot samples. The blood spots used in this investiga- 
tion were collected as part of the Great Smokey Mountain Study, 
which is a longitudinal study of the development of psychiatric 
disorders in youth collected in western North Carolina, USA 33 
that started in 1993. A trained staff member collected blood from 
two finger-pricks (yielding 10 spots total per visit) onto Schleicher 
and Schuell (S&S) filter paper number 903. 34 The samples were 
dried in room temperature and temporarily (less than two weeks) 
stored in refrigeration before shipped (without refrigeration) to 
the laboratory for long-term storage at -28°C. This protocol is 
consistent with the rigorous quality control program developed 
for newborn screening programs. 28 All procedures were approved 
by ethical committees in the US, and all subjects provided written 
informed consent (or legal guardian consent and subject assent). 

DNA from the blood spots was extracted using a modified 
version of the QIAamp DNA Mini Kit (QIAGEN). By altering 
the buffer volumes and the incubations the modified protocol 
allows for DNA extraction of a complete blood spot within a sin- 
gle reaction column without sacrifice of DNA yield. Full details 
of the modifications are given in the Supplemental Materials. 
The quantity and quality of the extracted DNA was investigated 
with NanoDrop 1000 (Thermo Scientific), run on 1% agarose 
gel and examined using Bioanalyzer DNA 1000 kit (Agilent). 

Whole blood samples. The methylome-wide data from DNA 
extracted directly from fresh whole blood used in this study 
originates from an existing methylome-wide investigation of 
1,500 schizophrenia case-control samples. 32 These samples were 
assayed, with high consistency in sequence data quality and dis- 
tribution of methylation signal, using 2—5 (xg of DNA as starting 
material (mean = 4.1 (JLg) . 

Methylome-wide profiling. We used MethylMiner 
(Invitrogen), which employs methyl-CpG binding domain 
(MBD) protein to enrich for the methylated genomic DNA frac- 
tion, followed by single-end next-generation sequencing (NGS) 
on the Applied Biosystems SOLiD platform (Life Technologies). 
Methods were standard and based upon manufacturers' recom- 
mendations. With the exception that blood spots were sequenced 
using the SOLiD 5500x1 instrument and whole blood samples 
were sequenced using SOLiD4 instruments, the same protocol 
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was applied for both sample types. Briefly, genomic DNA was 
fragmented with ultra-sonication to a median fragment size of 
150 bp. The methylated fraction of the genome was extracted 
using MethylMiner and used as input material for NGS SOLiD 
libraries. The samples were barcoded and pooled in equal molari- 
ties prior to emulsion PCR and attached to beads. The beads 
were deposited to slides and the first 50 bp of each fragment were 
sequenced. This enrichment in combination with NGS (MBD- 
seq) has already been demonstrated to be highly specific, sensitive 
and applicable to identify differently methylated regions. 35 " 40 

We have recently developed a data analysis pipeline that is 
specifically designed for MBD-seq. 32 In the present investiga- 
tion we are following the analysis and quality control steps from 
that pipeline. In short, the sequenced reads were aligned to the 
human genome (build hgl9/GRCh37) using BioScope 1.2 (Life 
Technologies). In the case of MBD-seq, only fragments with 
methylated CpGs can be extracted. Given that we know exactly 
where the CpGs are located, there is no need to search for read 
peaks to find methylated sites. 41,42 We therefore simply calculated 
coverage for the -27 million autosomal CpG sites in the reference 
genome (hgl9/GRCh37). A standard procedure is to count the 
sequence reads covering the CpG. Because the methylation of any 
CpG in the entire fragment could lead to its capture, the read 
length is sometimes extended to the expected fragment length. 
However, because not all fragments have exactly the same size, 
there may be variation between samples and the fragment pool 
obtained after shearing may not be identical to the pool that gets 
successfully sequenced (e.g., smaller fragments may be more likely 
to get extracted by the enrichment protocol), this procedure can 
be imprecise. Thus, rather than assuming an identical pre-deter- 
mined fragment size for all fragments and samples, we estimated 
the fragment size distribution for each sample from the empirical 
sequencing data. 43 The sample specific estimated fragment size 
distributions were used to calculate the probability for each read 
that the fragment it is tagging covers the CpG under consider- 
ation. Coverage for each CpG can then be calculated by taking 
the sum of the probabilities that all fragments in its neighborhood 
cover the CpG. 

CpG sites in loci that are problematic in terms of alignment 
need to be eliminated prior to analyses, as coverage estimates will 
be confounded with alignment errors. Based on the results from a 
previous in silico experiment 32 using exactly the same alignment 
parameters as applied in this investigation we eliminated CpG 
sites with known alignment problems. 
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